Skip to content

operator: KBS API for LUKS key registration#248

Open
iroykaufman wants to merge 1 commit into
trusted-execution-clusters:mainfrom
iroykaufman:luks-key
Open

operator: KBS API for LUKS key registration#248
iroykaufman wants to merge 1 commit into
trusted-execution-clusters:mainfrom
iroykaufman:luks-key

Conversation

@iroykaufman
Copy link
Copy Markdown
Contributor

@iroykaufman iroykaufman commented May 4, 2026

Currently, every time the LUKS key is updated, the operator patches the trustee deployment, which causes a restart of the pod. This PR introduces a way to avoid this by setting the LUKS key using the KBS API.

Core implementation points:

  • Right before the trustee first deploys the operator, create an auth key for the KBS API and save it as a kube secret.
  • Replacing the patch mechanism with API calls
  • Add a reconcile loop that watches for changes in the trustee deployment and sync LUKS key that was lost.

Tests:

  • test_luks_key_sync - this test checks that the luks key is first sent to tustee and validates that after trustee restarts, the keys are sent again. Also, at the end, I delete one of the machines and check that the secret has been deleted.

Summary by Sourcery

Integrate KBS API–based LUKS key management for trustee, replacing deployment patching with authenticated API calls and adding automatic resync on trustee restarts.

New Features:

  • Introduce authenticated KBS API integration to store and manage per-machine LUKS keys instead of mounting secrets into the trustee deployment.
  • Add a controller that watches the trustee deployment and re-synchronizes all machine LUKS keys to KBS when the trustee becomes ready.
  • Generate and manage an Ed25519 keypair as a Kubernetes secret for authenticating KBS API access from the operator.

Enhancements:

  • Label and configure the trustee (KBS) service and deployment consistently via a shared label selector constant and updated volume templates.
  • Switch KBS resource storage to the kvstorage plugin and configure KBS admin to use the public auth key for API access.

Build:

  • Pin compute-pcrs-lib to a specific Git revision for reproducible builds and add kbs-client as a new operator dependency, along with installing perl in the container image for the build tooling.

Tests:

  • Add an end-to-end test that verifies initial LUKS key upload, re-sync after trustee restart, and LUKS key deletion on machine removal, plus unit tests for Ed25519 key generation.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 4, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iroykaufman

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@Jakob-Naucke Jakob-Naucke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for creating this, and thank you for already creating a test. Make sure that linting & build/unit tests pass.

Comment thread operator/src/trustee.rs Outdated
pub async fn launch_trustee_sync_controller(client: Client) {
let deployments: Api<Deployment> = Api::default_namespaced(client.clone());
let watcher_config = watcher::Config {
label_selector: Some("app=kbs".to_string()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe a constant also used in generate_kbs_{service,deployment} is better

Comment thread operator/src/trustee.rs Outdated
use serde::{Serialize, Serializer};
use serde_json::{Value::String as JsonString, json};
use std::collections::BTreeMap;
use kbs_client;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right

@iroykaufman iroykaufman force-pushed the luks-key branch 2 times, most recently from dc726d8 to c5ec81d Compare May 6, 2026 15:22
@iroykaufman
Copy link
Copy Markdown
Contributor Author

The CI is failing because we need to have some Perl dependency. I solved this by adding this RUN dnf install -y perl-FindBin perl-core to the containerfile. @Jakob-Naucke, any idea what the best approach is to solve it for the CI?

@Jakob-Naucke
Copy link
Copy Markdown
Contributor

@iroykaufman these tests run in (Ubuntu) GHA containers that aren't affected by the Containerfile (this would get picked up in integration tests once I manage to fix them). You'll need to add them to the steps in .github/rust.yml:

steps:
  - name: "Install OpenSSL dependencies"
    run: apt-get install -y …

@Jakob-Naucke
Copy link
Copy Markdown
Contributor

@iroykaufman whoops, you can set your own build container, and we do, and it's a Fedora container, so it should be dnf as usual. You did set it in the good place though afaict.

@Jakob-Naucke
Copy link
Copy Markdown
Contributor

Jakob-Naucke commented May 7, 2026

@iroykaufman thanks for the updates. integration tests are failing because these packages are missing there too. in my intuition, those shouldn't need to be built on the host at all so let me look into that first.

e: before the next push you can also still look into the lint failures

@iroykaufman
Copy link
Copy Markdown
Contributor Author

@iroykaufman thanks for the updates. integration tests are failing because these packages are missing there too. in my intuition, those shouldn't need to be built on the host at all so let me look into that first.

Thanks, let me know and then I'll push with the lint fix

@iroykaufman
Copy link
Copy Markdown
Contributor Author

iroykaufman commented May 7, 2026

@Jakob-Naucke maybe update the container file in buildroot with the openssl dependency is better then adding them in the github workflow files. WDYT?

@Jakob-Naucke
Copy link
Copy Markdown
Contributor

@Jakob-Naucke maybe update the container file in buildroot with the openssl dependency is better then adding them in the github workflow files. WDYT?

Yes. I got confused by first thinking we were on GHA's Ubuntu containers, but the buildroot is the better place.

@Jakob-Naucke
Copy link
Copy Markdown
Contributor

and the CI host does need these to compile the integration tests, which is slightly not how I thought cargo dependencies worked, but alas, these packages are now installed

- Right before the trustee first deploys the operator, create an auth key for the KBS API and save it as a kube secret.
- Replacing the patch mechanism with API calls
- Add a reconcile loop that watches for changes in the trustee deployment and sync LUKS key that was lost.

Test: test_luks_key_sync - this test checks that the luks key is first sent to tustee and
validates that after trustee restarts, the keys are sent again.

Signed-off-by: Roy Kaufman <rkaufman@redhat.com>
@iroykaufman
Copy link
Copy Markdown
Contributor Author

I deleted the changes for the github workflow and opened PR#9. When this is merged, the CI should work

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 7, 2026

@iroykaufman: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/operator-lifecycle-verify 472438d link true /test operator-lifecycle-verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 7, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@alicefr
Copy link
Copy Markdown
Contributor

alicefr commented May 11, 2026

@sourcery-ai review

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 11, 2026

Reviewer's Guide

Implements KBS API–based LUKS key management so trustee no longer needs its deployment patched on each key update, introduces an auth keypair/secret for KBS, adds a controller to resync machine LUKS keys when trustee pods restart, switches KBS storage to kvstorage, and adds an end‑to‑end test covering LUKS key sync and deletion.

Sequence diagram for LUKS key registration via KBS API

sequenceDiagram
    actor User as ClusterOperator
    participant K8s as KubernetesAPI
    participant RegSrv as RegisterServerController
    participant Trustee as TrusteeModule
    participant KBS as KBSService

    User->>K8s: Create/Update Machine
    K8s-->>RegSrv: Machine event
    RegSrv->>RegSrv: keygen_reconcile
    RegSrv->>Trustee: generate_secret(client, id, owner_reference)
    Trustee->>K8s: Create Secret id (root key)
    K8s-->>Trustee: Secret id created

    RegSrv->>Trustee: send_secret(client, id)
    Trustee->>K8s: Get Secret id
    K8s-->>Trustee: Secret id (root data)
    Trustee->>K8s: Get Secret trustee-auth
    K8s-->>Trustee: Secret trustee-auth (private.key)
    Trustee->>KBS: set_resource(url, auth_key, resource_bytes, path, [])
    KBS-->>Trustee: 201 Created
    Trustee-->>RegSrv: Ok
    RegSrv-->>K8s: Reconcile result Action::await_change()
Loading

Sequence diagram for LUKS key resync on trustee restart

sequenceDiagram
    participant K8s as KubernetesAPI
    participant SyncCtl as TrusteeSyncController
    participant TrusteeDep as TrusteeDeployment
    participant Trustee as TrusteeModule
    participant KBS as KBSService

    Note over TrusteeDep,K8s: TrusteeDeployment restarted
    K8s-->>SyncCtl: Deployment event (label app=kbs)
    SyncCtl->>SyncCtl: trustee_deployment_reconcile
    SyncCtl->>TrusteeDep: Read status (ready_replicas, replicas)
    TrusteeDep-->>SyncCtl: Status ready
    SyncCtl->>Trustee: sync_all_machine_luks_key(client)

    Trustee->>K8s: List Secrets (default namespace)
    K8s-->>Trustee: Secrets owned by Machines
    loop For each machine Secret id
        Trustee->>Trustee: send_secret(client, id)
        Trustee->>K8s: Get Secret id
        K8s-->>Trustee: Secret id (root data)
        Trustee->>K8s: Get Secret trustee-auth
        K8s-->>Trustee: Secret trustee-auth (private.key)
        Trustee->>KBS: set_resource(url, auth_key, resource_bytes, path, [])
        KBS-->>Trustee: Response
    end
    Trustee-->>SyncCtl: Ok
    SyncCtl-->>K8s: Action::await_change()
Loading

Sequence diagram for LUKS key deletion via KBS API

sequenceDiagram
    participant K8s as KubernetesAPI
    participant RegSrv as RegisterServerController
    participant Trustee as TrusteeModule
    participant KBS as KBSService

    K8s-->>RegSrv: Machine deletion event
    RegSrv->>RegSrv: keygen_reconcile finalizer
    RegSrv->>Trustee: delete_secret(client, id)
    Trustee->>K8s: Get Secret trustee-auth
    K8s-->>Trustee: Secret trustee-auth (private.key)
    Trustee->>KBS: delete_resource(url, auth_key, path, [])
    KBS-->>Trustee: 200 OK
    Trustee-->>RegSrv: Ok
    RegSrv-->>K8s: Finalizer completed (Action::await_change())
Loading

Class diagram for new Ed25519 key pair and trustee auth secret generation

classDiagram
    class Ed25519KeyPair {
        +Vec~u8~ private_key_pem
        +Vec~u8~ public_key_pem
    }

    class TrusteeModule {
        +generate_ed25519_key_pair() Result~Ed25519KeyPair~
        +generate_trustee_auth_keys_secret(client, owner_reference) Result~()~
        +get_auth_key(client) Result~String~
        +send_secret(client, id) Result~()~
        +delete_secret(client, id) Result~()~
        +sync_all_machine_luks_key(client) Result~()~
        +launch_trustee_sync_controller(client) void
    }

    TrusteeModule --> Ed25519KeyPair : creates
Loading

File-Level Changes

Change Details Files
Replace trustee deployment patching with KBS API calls for LUKS key lifecycle (create/sync/delete).
  • Introduce get_auth_key helper to read the trustee auth private key from a namespaced Kubernetes Secret
  • Implement send_secret to fetch machine LUKS key Secret data and call kbs_client::set_resource against the KBS HTTP endpoint
  • Implement delete_secret to remove machine LUKS key resources via kbs_client::delete_resource
  • Update keygen_reconcile to use send_secret on create and delete_secret on cleanup instead of mount_secret/unmount_secret, and improve cleanup error reporting
operator/src/trustee.rs
operator/src/register_server.rs
operator/src/kbs-config.toml
operator/src/main.rs
operator/Cargo.toml
lib/src/endpoints.rs
Add trustee auth keypair generation and wiring so KBS API is authenticated via Ed25519 public key and key material is mounted into KBS pods.
  • Add Ed25519KeyPair type and generate_ed25519_key_pair helper with tests validating PEM format and uniqueness
  • Implement generate_trustee_auth_keys_secret to create the trustee-auth Secret with private and public keys owned by the cluster
  • Extend KBS volume templates to mount the trustee-auth Secret under /key, exposing only the public key file
  • Update kbs-config.toml admin section to use auth_public_key=/key/public.pub and switch the resource plugin type to kvstorage
  • Set the KBS deployment label selector to use the KBS_LABEL_SELECTOR constant and apply matching labels on the Deployment
operator/src/trustee.rs
operator/src/kbs-config.toml
operator/src/main.rs
lib/src/endpoints.rs
tests/Cargo.toml
Introduce a trustee deployment watcher/controller that resynchronizes all machine LUKS keys to KBS when trustee becomes ready.
  • Add sync_all_machine_luks_key to enumerate Machine-owned Secrets and push each to KBS via send_secret with logging and per-secret error handling
  • Implement trustee_deployment_reconcile to detect when the trustee Deployment has desired ready replicas and then trigger a full machine LUKS key resync
  • Add launch_trustee_sync_controller to start a kube-runtime Controller watching Deployments labeled with app=KBS_LABEL_SELECTOR, and wire it into main() startup
operator/src/trustee.rs
operator/src/main.rs
lib/src/endpoints.rs
Extend test coverage with an end-to-end LUKS key sync scenario and expose a helper for waiting on Deployment readiness.
  • Add test_luks_key_sync integration test that creates two Machines, waits for their Secrets, verifies both are sent to KBS from operator logs, restarts the trustee Deployment and checks resync logs, deletes one Machine and confirms its secret is removed from KBS, and performs cleanup
  • Expose TestContext::wait_for_deployment_ready as pub so tests can wait for trustee Deployment readiness
tests/trusted_execution_cluster.rs
test_utils/src/lib.rs
Tidy up dependencies and build/test support for the new KBS client and tests.
  • Pin compute-pcrs-lib to a specific git revision for reproducibility
  • Add kbs-client git dependency to the operator crate for calling the KBS API
  • Add chrono as a test dependency to support timestamped deployment restarts
  • Install perl in the Containerfile to satisfy new build or test requirements
Cargo.toml
operator/Cargo.toml
tests/Cargo.toml
Containerfile
Cargo.lock

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In launch_trustee_sync_controller, the watcher label_selector is set as "app={KBS_LABEL_SELECTOR}", which will not expand the constant and thus never match your KBS pods; this should be built with format!("app={KBS_LABEL_SELECTOR}") or similar.
  • The new test_luks_key_sync relies on grepping operator logs for specific hard-coded message substrings (including the exact count "Syncing 2 machine luks key to KBS"), which makes the test brittle to benign log or wording changes; consider asserting behavior via the KBS state or Kubernetes resources instead of log contents where possible.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `launch_trustee_sync_controller`, the watcher `label_selector` is set as `"app={KBS_LABEL_SELECTOR}"`, which will not expand the constant and thus never match your KBS pods; this should be built with `format!("app={KBS_LABEL_SELECTOR}")` or similar.
- The new `test_luks_key_sync` relies on grepping operator logs for specific hard-coded message substrings (including the exact count `"Syncing 2 machine luks key to KBS"`), which makes the test brittle to benign log or wording changes; consider asserting behavior via the KBS state or Kubernetes resources instead of log contents where possible.

## Individual Comments

### Comment 1
<location path="operator/src/trustee.rs" line_range="244" />
<code_context>
+pub async fn launch_trustee_sync_controller(client: Client) {
+    let deployments: Api<Deployment> = Api::default_namespaced(client.clone());
+    let watcher_config = watcher::Config {
+        label_selector: Some("app={KBS_LABEL_SELECTOR}".to_string()),
+        ..Default::default()
+    };
</code_context>
<issue_to_address>
**issue (bug_risk):** The label selector string is not interpolating `KBS_LABEL_SELECTOR` and will not match any pods.

Because the selector is the literal string `"app={KBS_LABEL_SELECTOR}"`, the controller will never match any `kbs`-labelled resources. Build the selector string dynamically (e.g. `format!("app={}", KBS_LABEL_SELECTOR)` or a const with the expanded value) so it actually aligns with the deployment and service labels.
</issue_to_address>

### Comment 2
<location path="tests/trusted_execution_cluster.rs" line_range="604-613" />
<code_context>
+    // Delete machine1 and verify its secret is removed from both K8s and KBS
</code_context>
<issue_to_address>
**issue (testing):** Also assert that the machine Secret is deleted from Kubernetes, not only from KBS logs

Right now the test only checks the operator logs for `"Secret {id1} deleted successfully"`, which covers the KBS side. To match the comment and fully validate behavior, please also assert that the corresponding Kubernetes Secret no longer exists (for example, `secrets_api.get(&machine1_uuid).await` returns `NOT_FOUND` or fails via the Poller).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread operator/src/trustee.rs
pub async fn launch_trustee_sync_controller(client: Client) {
let deployments: Api<Deployment> = Api::default_namespaced(client.clone());
let watcher_config = watcher::Config {
label_selector: Some("app={KBS_LABEL_SELECTOR}".to_string()),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The label selector string is not interpolating KBS_LABEL_SELECTOR and will not match any pods.

Because the selector is the literal string "app={KBS_LABEL_SELECTOR}", the controller will never match any kbs-labelled resources. Build the selector string dynamically (e.g. format!("app={}", KBS_LABEL_SELECTOR) or a const with the expanded value) so it actually aligns with the deployment and service labels.

Comment on lines +604 to +613
// Delete machine1 and verify its secret is removed from both K8s and KBS
machines
.delete(&machine1_name, &Default::default())
.await?;
test_ctx.info(format!("Deleted Machine 1: {machine1_name}"));

let poller = Poller::new()
.with_timeout(Duration::from_secs(60))
.with_interval(Duration::from_secs(2))
.with_error_message("Machine1 secret not deleted from KBS".to_string());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Also assert that the machine Secret is deleted from Kubernetes, not only from KBS logs

Right now the test only checks the operator logs for "Secret {id1} deleted successfully", which covers the KBS side. To match the comment and fully validate behavior, please also assert that the corresponding Kubernetes Secret no longer exists (for example, secrets_api.get(&machine1_uuid).await returns NOT_FOUND or fails via the Poller).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants